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(m) Congestion control for connectionless traffic in data networks via alternate routing. 



(§) A congestion control scheme for connectionless networks relieves congestion by routing a portion of 
traffic on a congested primary path onto a predefined altemate path constructed such that loop- 
freedom is guaranteed. Explicit care is taken to avoid spreading congestk>n onto altemate paths. The 
control actions are taken in a completely distributed manner, based on local measurements only and 
therefore no signaling messages need to be exchanged between nodes. 

If desired, lower loss priority may be assigned to altemate routed traffic. Congestion is monitored 
locally and thresholds defined to dedare the onset and abatement of congestion. The present invention 
affords at least an order of magnitude improvement in end-to-end cell blocking under sustained 
fbcussed overload. 
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Technical Field 

The present invention relates generally to data comnnunications, and, In particular, to a congestion control 
scheme for oonnectionless traffic In data networks. 

5 

Background of the Invention 

Connectionless data networks (such as the ARPANET network) permit the interchange of packetized data 
between interconnected nodes without the need for fixed or centralized network routing administration. Each 

10 node examines packet header information and makes routing decisions based only upon locally available infor- 
mation, without explicit knowledge of where the packet originated or of the entire route to the destination node. 
In this environment, traditional congestion control strategies such as window flow control and per virtual circuit 
buffering and pacing cannot be used because of the absence of end-to-end acknowledgements. 

One congestion control approach that has been Implemented in some connectionless networks is the use 

IS of choke messages. In this method, a congested node sends feedbadk messages to other nodes, asking them 
not send traffic to it until further notification. There are several drawbacks to this approach: first, by the time 
the choke message reaches the offending node, a substantial amount of traffic would have been transmitted. 
For example. In a network consisting of 150 Mbps trunks, a choke packet sent on 1000 mile long link takes 10 
msecs of propagation time. In this time, 1 .5 M bits are already in transit and will contribute to existing congestion. 

20 Secondly, in connectionless networks, there is no knowledge of tiie path traversed by a packet before arriving 
at a given node; therefore, choke messages may have to be sent to all the neighbors including those that do 
not contribute to congestion. This will lead to under-utilization of the network. Another difficulty with this method 
is tiie action taken by a node upon receiving a choke packet If it drops all packets headed towards ttie con- 
gested node, then subsequent retransmissions will contribute to Increased congestbn. Since there is no con- 

25 nectlon-oriented layer that the networi< Interacts with, it is difficult to slop traffic at the sources responsible for 
causing congestion. Therefore choke messages do not appear to be an effective means of congestion control 
in connectionless networics. 

Certain other approaches that have been tried in connectionless networks such as ARPANET involve 
changing network routing in response to changes in traffic conditions, by dynamically recomputing paths be- 

30 tween nodes in a completely distributed fashion. This can be illustrated by considering the RIP scheme which 
has been tried in ARPANET. In RIP, each node stores tiie entire networic topology, and periodically transmits 
routing update messages to its neighboring nodes. The routing update messages provide reachability infor- 
mation which tells each neighboring node how the originating node can reach the otiier nodes in the network, 
together with some measure of the minimum distance to the various nodes. The measure of distance used is 

35 different in different versions of RIP. The original RIP protocol used hop-counts to measure distance, while sut>- 
sequent modifications use delay estimates to reach a destination as a measure of distance. 

The problem with the RIP scheme is that it has several serious drawbacks: first, a large amount of infor- 
mation must be exchanged between nodes in order to ensure consistent routing changes, and this itself may 
consume significant network resources. Second, because paths are dynamically recomputed, there is serious 

40 potential for problems such as packet looping, packet missequencing and route oscillations. Also, because of 
propagation delay, the infonnation exchanged between nodes may be outdated, and hence may not be reliable 
for changing routing. This problem is especially serious in high speed networks (> 45 mbps). 

A second dynamic routing protocol called IGRP uses a composite metric which includes propagation delay, 
path bandwidth, path utilization and path reliability, as a measure of distance. If the minimum distance path is 

45 different from the one currentiy in use, then all the traffic is switched to the newly computed shortest path. If a 
set of paths are "equivalent", load balancing is used. 

When dynamic changes in routing are occasioned by the IGRP protocol, traffic shifts from one path to 
another, so tiiat congestion may be caused on the new path. Subsequent distance and shortest path compu- 
tation may then switch the traffic back onto the original path. In this manner each path would experience oscil- 

50 lations in offered traffic and the end result may well be that neither path is fully utilized. This problem may only 
be partly alleviated by averaging the distance measurements over an interval of time before transmitting to the 
other nodes. 

A third, very recent proposed enhancement to the ARPANET routing protocol decribed in "An Extended 
Least-Hop Distributed Routing Algorithm," written by D. J. Nelson, K. Sayood, and H. Chang, published in IEEE 
55 Transactions on Communications, Vol. 38, No. 4, April 1990, pages 520-528, augments the set of avaiable 
shortest path routes to carry packets to a given destination by including routes that are one hop longer than 
tiie shortest path routes. Each node maintains an estimate of tiie total delay involved in reaching every desti- 
nati'on. The route which has tiie minimum delay to a given destination is then picked from the set of routes avail- 
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able to carry traffic to that destination. Although this approach shows considerable improvement over the exist- 
ing ARPANET routing, it also has several disadvantages. First, the optimal, minimum delay path has to be cho- 
sen for each packet, leading to increased processing in the switch. Second, at any given time, only one path 
is active and hence there is no notion of load balancing. All traffic is routed on the same path until a path with 
5 a better delay estimate is available. Third, nodes need to exchange delay information and hence some form 
of signaling between nodes is necessary. Lastly, only paths that are one hop longer are considered in addition 
to the shortest paths. Thus, some longer idle paths will not be chosen, even though they could have successfully 
carried the traffic. 

Yet another possibility for dealing with congestion is to try to reduce the impact of its consequences. For 
10 example, one way of avoiding packet losses due to buffer overflow is to increase the link buffer sizes. There 
is a serious drawback to this approach: if the buffer size is made very large, cells will experience high queueing 
delays and end-to-end perfonnance may be affected to the extent that the end systems may time-out and 
retransmit On the other hand, if the buffer size Is designed to keep the maximum queueing delay within accept- 
able bounds, then since the buffer occupancy tends to increase exponentially as the link utilization approaches 
15 unity, buffers will eventually overflow in the face of sustained focussed overioad on the link and the resulting 
cell losses will cause the end systems to retransmit Thus, increasing the buffer size Is not a viable congestion 
control strategy. 

Summary of the invention 

20 

In accordance with the present invention, congestion caused by transient focussed overioads in connec- 
tionless networks is relieved by routing a portion of traffic Intended for a congested primary path onto a pre- 
defined altemate path. An explicit algorithm is used for constructing alternate paths in such a way that loop- 
freedom is guaranteed. Briefly, this is done by organizing the nodes that neighbor a given node into layers such 

25 that nodes that are the same distance (in hops) from a given destination are in the same layer. A weight is then 
assigned to each possible path between (a) the given node and each neighbor in the same layer, and (b) each 
neighbor and a node in a closer layer (in hops) to which the neighboring node is connected. The pairwise sum 
of the weights for each combination of paths is then computed and the altemate path is determined as the path 
having the minimum sum. Furthennore, care is taken to avoid spreading congestion onto altemate paths by 

30 marking altemately routed packets, so that they are more readily droped in the event that congestion is again 
encountered at nodes further along the altemate path. By appropriately choosing threshold values for initiating 
a transition to an altemate route and for revering to a primary route, route oscillations can be avoided. The rout- 
ing determinations and network control actions are taken in a completely distributed manner based on local 
measurements only, and therefore no signaling messages or routing data need to be exchanged between net- 

35 work nodes. The invention is nfK>st useful in conjunction with data networks where traffic tends to be very bursty, 
because when some paths are busy, it is quite likely that others are relatively idle. Accordingly, when there is 
non-coincidence of overioads on various parts of the network, our invention provides the greatest benefits. 

Brief Description of the Drawing 

40 

The present invention will be more fully appreciated by refierence to the following detailed description, when 
read in light of the accompanying drawing in which: 

Fig. 1 is a diagram illustrating the interconnection of an exemplary network having seven nodes; 

Figs. l-A illustrate the "exclusionary trees" developed by one of the nodes in the networic of Fig. 1; 
45 Figs. 5-7 illustrate the 'exclusionary trees" received by one of the nodes In the network of Fig. 1 ; 

Figs. 8-10 illustrate the "exdustonary trees" of Figs. 5-7, respectively, which have been redrawn so that 

each successive node descending from a root node is placed at the same vertical level; 

Fig. 11 illustrates the result when the "exclusionary trees" of Figs. 8- 10 are merged; 

Fig. 12 illustrates the "layering" of nodes in a network with respect to a destination node; 
so Fig. 13 illustrates primary and alternate paths between some of the nodes in Fig. 12; 

Figs. 14 and 15 illustrate undesirable single link looping between a pair of nodes; 

Fig. 1 6 illustrates one example of our altemate routing technique as applied to a four node networi< in which 

three nodes are located in a first layer and the fourth node Is located in a second layer; 

Fig. 17 is a redrawn vereion of Fig. 16 in which the fourth node is replaced by three "equivalent" nodes; 
55 Fig. 18 illustrates altemate and primary routing paths between a firet node i and a second node j; 

Fig. 19 illustrates altemate routing between the three nodes in the first layer of Fig. 17 that would lead to 

undesirable looping: 

Fig. 20 illustrates multiple nodes in a network, and arangement of such nodes in k layere; 
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Fig. 21 is a flow chart illustrating the overall process for generating alternate routes in accordance with the 

invention; 

Figs. 22 and 23 are a flow chart (in two parts) that illustrates the process for performing layering step 2103 
of Fig. 21; 

5 Figs. 24 and 25 are a flow chart (in two parts) that illustrates In more detail the process of generating alter- 

nate routes in accordance with the Invention; and 

Fig. 26 is a flow chart Illustrating the process for giving higher priority to packets that are routed on uncon- 
gested routes and for allowing marked packets that travel on altemate routes because of congestion to be 
discarded in the event that heavy traffic is encountered; 
10 Fig. 27 is a typical functional architecture for the nodes in the networics of Figs. 1-20; 

Fig. 28 is a functional diagram of the arrangement of nodal processor 2730 of Fig 27; 
Fig. 29 is a three node network model used in simulations of the present invention; 
Fig. 30 is a queueing model corresponding to the network of Fig. 29; 

Fig. 31 is a graph illustrating blocking probability with and without altemate routing as a function of offered 
IS load; and 

Fig. 32 is a rescaled version of the graph of Fig. 31 . 

Detailed Description 

20 In order to fully understand the altemate routing technique of the present invention, it is instructive to first 

consider one technique that can be employed to detenmine the primary path taken by messages traveling be- 
tween nodes in a connectionless network under normal (I.e., non-congested) conditions. This technique is dis- 
tributed adaptive minimum spanning tree routing, sometimes also known as "exclusionary tree" routing, details 
of which can be found in Patent No. 4,466,060 issued to G. G. Riddle on August 14, 1 984. Other routing techni- 

25 ques are described in D. E. Comer's book, "Intemetworking With TCP/IP: Principles, Protocols and Architec- 
ture," Chapter 15: Interior Gateway Protocols, Prentice Mali, 1988. The overall objective of the exclusionary 
tree technique is to maintain a table of correct shortest paths to all destlnatbns at each node of the networie 
For this purpose, routing tables are initially constructed and updated whenever there are topological changes 
in the network, as for example, when a node or link Is added or deleted. The update procedures are implemented 

30 at each node independently, in a distributed fashion. The resulting routing tables are designed to yield minimum 
hop paths to all destinations in such a way that there is no looping. 

Two principal steps are at the heart of the exclusionary tree routing technique: (1) Each node sends an 
exdusbnary biee to each of Its neighbors, and (2) a prescribed procedure is employed at each of the nodes to 
merge the received exclusionary trees into a routing table. These two steps are repeated at each node until 

35 the routing tables converge. 

The exclusionary tree routing technique can be best described through the following example. Consider 
the network consisting of seven nodes 1-7 shown in Fig. 1. Each node sends an exclusionary tree to each of 
its neighbors. An exclusionary tree is the shortest path tree obtained after deleting all links connected to the 
receiving node. Figs. 2-4 illustrate the exclusionary trees sent by node 1 to its neighbors, namely nodes 6, 5 

40 and 2, respectively. Figs. 5-7 show the exclusionary trees received by node 1 from its neighbors, nodes 6, 4 
and 2, respectively. The received exclusionary trees are each first redrawn with their nodes descending from 
the root, each successive node being placed at a vertical level corresponding to its distance in hops from node 
1 , as shown in Figs. 8-10. The merged tree for node 1 shown in Fig. 1 1 is obtained by merging the exclusionary 
trees of Figs. 8-10 receh^ed by node 1 from its neighbors, according to the following procedure: The received 

45 exdusbnary trees' nodes at a distance of one hop are visited from left to right (In the example, node 6, then 
node 5, then node 2) and placed in the merged tree of Fig. 11). Next, nodes at a distance of two hops are 
visited in the same order (left to right) and are attached to their parent nodes, if they are not already there at 
a lesser distance. This procedure is repeated successively to create a merged tree. If the node of interest is 
present in nrare than one received exclusionary tree at the same distance in hops, then each root node is rep- 

50 resented in the merged tree, resulting in multiple entries for nodes that have multiple equal length routes. This 
situatbn did not occur in Fig. 1 1 . Whenever multiple equal length routes exist, traffic is distributed over all such 
routes so as to achieve load balancing . It is to be noted here that other techniques can also be used to determine 
the primary network routing used in the absence of congestion. 

In accordance with our invention, during times of congestion, some fraction of the packets normally routed 

55 on primary routes are instead routed on secondary or altemate paths that are lightly loaded. The manner in 
which altemate routes are selected will be better understood by first considering an arbitrary network which is 
depicted in the form of a layered architecture in Fig. 12. The layering in Fig. 12 is with respect to destination 
node D, such that nodes 1231-1233 in layer k (k Is an Integer) have at least one k-hop shortest path to D. This 



4 



V 




EP 0 465 090 A1 



10 



15 



20 



25 



30 



35 



40 



45 



50 



means that every node in layer k must have at least one link connecting it to a node in layer (kA). If a node in 
layer k is conneded to more than one node (1221>1223) in layer (k-l), then It has more than one k-hop shortest 
path to 0. These are precisely the multiple shortest paths constructed by the exclusionary tree routing algorithm 
descrik)ed above. By exploiting the connectivity between nodes 1231 - 1233 in layer k, our technk^ue is used 
to generate loop-free altemate paths to D which are at least of length (k+i) hops. There are two ways of doing 
this. Both assume that only shortest path primary routes are penmitted. 

In the first method, if all nodes are numbered, and If we let nodes i and J (i and j are integers) belong to 
layer k and let nodes i and j be connected by a link, then node i can altemate route packets intended for des- 
tination D via node j if i<j. This method is loop-free, because the primary routes are hierarchical shortest patii 
routes, while the secondary (altemate) routes essentially create a hierarchy within layer k. For example, Fig. 
13 shows 4 nodes numbered 1301-1304 in layer k connected to4 nodes numbered 1311 - 1314 in layer (k-l). 

In Fig. 1 3, routing choices marked 1 are primary routes and routing choices nnarked 2 are secondary routes. 
It is dear that no inter-layer looping is possible since there is no downward routing - a node in layer (k-l) cannot 
route to a node in layer k. No intra-layer looping is possible, because node 1304 cannot altemate route to any 
of the other (lower numbered) nodes. This first method is simple to implement, but has the disadvantage that 
the highest numbered node (e.g., node 1304) in every layer is denied an alternate route. This disadvantage is 
overcome, albeit at the cost of added complexity. In the second method. 

In the second method, we require two additional pieces of infonnation These are: 

(i) the ability to avoid single link loops of ttie fonm shown in Fig. 14, wherein node i must recognize that a 
packet was routed to it by node j, and must prevent the packet from going back to node j. This is necessary 
to avoid looping between nodes i and j when i and j route to each ottier on a second choice basis. As shown 
in Fig. 1 5, i.e., when the promary paths out of both nodes I and j are unavailable (due to congestion), packets 
must not be allowed to loop between i and j, but should be dropped. 

(ii) Every node 1 must be assigned weights w(i,j) with respect to all other nodes j to which it is connected. 
Further, the weights must be chosen so that: 

(a) they are symmetric, i.e., w(i,j) = w(j,i) for all i and j, and 

(b) w(i,j) + w(j,k) is unique, in the sense that w(i,j) + w(i,k) = w(i,l) + w(l,m) => j=l and k=m. 

This condition means that for any two nodes that have at least one two-hop path connecting them, there 
^ 3 uniquQ minimum weight two-hop path connecting the two nodes. One way of satisfying condition (b) is to 
choose weights so ttiat the palrwise sum is unique, i.e., such that no two sums are the same. As will be described 
below, the weight information can be transmitted to each node togetiier with the exdusbnary tree routing infor- 
mation. The reason why the assignment of these weights is necessary will be also explained below. 

Under the above condittons, the fact that our altemate routing technique is loop-free can be demonstrated 
as follows: 

Let nodes 1 ,2,...m be in layer k and nodes V, 2\ 3' m ' in layer (k-l). The nodes in layer (k-l) may be 

repeated and are not necessarily unique (for notational convenience). Let us suppose that node i in layer k is 
connected to node i' in layer (k-l). There is no loss of generality in doing this because, even if layer (k-l) has a 
single node, it can be repeated m times. For example, Fig. 16, which shows links between nodes 1-3 of layer 
k and node 4 of layer k-l may be redrawn as Fig. 17, in witch node 1 is linked to node V, node 2 is linked to 
node 2' and node 3 is linked to node 3', as long as nodes V ,2' and 3' are each "equal" to node 4. 

The route i->l' is always tiie primary route from node i, for all packets to a particular destination D (witii 
respect to which the network has been layered). With this notation in mind, our loop-free alternate routing tech- 
nique may be expressed in the following manner 

Routing Rule: Let nodes 1,J and £ belong to layer k and nodes j' and i' belong to layer k-1 . Then, node i alternate 
routes to node j if and only if 



Equation 1 is illustrated diagrammatically in Fig. 1 8, in which nodes I, j and j' are shown. In that figure, the link 
between nodes i and j is marked 2, indicating that this is the secondary path from node i to node j in layer k; 
ttie link between nodes j and j' is marked 1 , indicating that this is the primary path from node j in layer k to node 
j' in layer k-1 . In accordance with our technk^ue, w(i,j)+w0.j') is then the unique minimum weight 2-hop path to 
get from node i to an^ node in layer (k-l). 

The loop-free property of our technique can be demonstrated by firet considering the case when m-3, i.e., 
ttisee nodes in layer k. Assume that nodes in layer k are fully connected. The corresponding network Is shown 
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in Fig. 17. The connectivity between nodes in layer (k-1) is not important, and, hence, is not shown. The only 

possibility of looping occurs rf each node in layer k alternate routes to a node in layer k that has not previously 

served as an alternate. For example, the situation illustrated In Fig. 1 9 is a loop, because node 1 altemate routes 

(marked "2") to node 2, node 2 altemate routes to node 3, and node 3 altemate routes back to node 1. Using 
5 our technique, such a loop cannot occur, because if node 1 altemate routes to node 2, and node 2 alternate 

routes to node 3, then node 3 must necessarily altemate route to node 2, so that a loop cannot occur. Now. 

the fact that node 1 altemate routes to node 2 implies that 

w(1 ,2) + w(2,2')<w(1 ,3) + w(3,3') (2) 

Next, the fact that node 2 altemate routes to node 3 Implies that 
10 w(2.3) + w(3, 3')<w(2,1) + w(1,r) (3) 

Adding inequalities (2) and (3) and noting that w(i,j) = w(j,i), we get 

w(3,2) + w{2,2')<w(3,1) + w(1,1') (4) 

which implies that node 3 altemate routes to node 2. 

Thus, a three link loop cannot occur. However, a single-link loop may occur and hence the nodes must 
15 have the ability to recognize and prevent a single-link loop. Such a capability can be implemented simply In 

each node by preventing messages or packets from departing from the node on the same link that they arrived 

on. This is discussed in more detail below. 

It should be noted that if the weights w(i.j) are chosen to be the actual distance d(i.j) between nodes i and 

j, then our invention leads to shortest distance altemate routing, which would be very important in a geographh 
20 cally dispersed network. However, while the symmetry property, viz., d(i,j) = d(j,l) is satisfied, the uniqueness 

property is not guaranteed. To ensure uniqueness, the intemodal distances may have to be Infinitesimally per- 

turbed so that If 

dOJ) + d(j.j') = d(U) + d(k,k'). (5) 
then d(i.j) is changed to d(i,j) + e, where 8 is an arbitrary small number. However, it may be noted that since 
25 d(i,j) are real numbers, practically speaking the uniqueness condition is generally satisfied. The distance infor> 
mation can easily be provided to each node when the distributed shortest path route Is determined, by providing 
V,H coordinates. If ViHjandVjHj are coordinates for two connected nodes i and j, the distance d(i j) is given by 

30 Other fully distributed techniques can be used to find a mapping between the nodes, i, j and j' used in alter- 

nate routing and the weights w(i,j),w(i,j') associated with the links between the nodes. For example, if each 
node, 1, J, j' has a unique integer number, then w (i, j) can be arbitraly defined as (t^+jp) and w (i,j') can be likewise 
defined as (jp+'f^), where q is a suitably chosen integer. Other mappings (i, j) w (i, j) such that w (i,j) = w (j, 
I) and w(i,j)+w(j.j')=w(ii f 0=>j=^ can also be found. However, weight assignments may also be centrally 

35 administered, and the appropriate weights periodically downloaded to each node when there is a topographical 
change in the network, without significantiy degrading the performance of the network. 

It is dear from the prevk)us discussk)n that the topological information needed to construct loop-free alter- 
nate routing in accordance with our invention is the layering of the network with respect to every destination 
node. This infonmation can be readily obtained firom the exclusionary tree information which is already available 

40 in each node. Consider the layering with respect to destination node D shown in Fig. 20. Suppose that the node 
at which we are constructing the routing table is node S in layer k. Let node S be connected to nodes Si, - Sk 
in layer k. Now, node S knows through its own primary routing table (constructed using exclusionary tree rout- 
ing) that it is in layer k with respect to 0. It also knows from the exclusionary trees received from S^, • • S^, that 
they are also in layer k with respect to D. There may be other nodes in layer k which S does not know about 

45 But this does not matter as S is not connected to those nodes and hence could not altemate route to any of 
them. The key is that the exclusionary tree information is sufficient for a node to detenmine whk:h of its neighbors 
are in the same layer as itself with respect to any given destinatton node. (This is unlike a centralized algorithm, 
in which all nodes have global knowledge of the network topology and, hence, every node knows all the other 
nodes in its layer. This is more information than needed, since a node only has to know the other nodes in the 

50 layer to which it is connected.) 

The overall process by which each node determines its secondary routing table is shown in the flow chart 
of Fig. 21 . Initially, in step 21 01 , network topology information, i.e., the identity of nodes that neighbor the current 
node is determined from the exclusionary tree routing infonnation already available in tiie node. If another tech- 
nique is used to generate the primary route, it Is nevertheless assumed that tills topology information is at hand. 

55 Likewise, in step 2102, the weights wg associated with the paths between tfie cun'ent node and Its neighbors 
are computed from V, H coordinates if intemodal distance is used as the weighing criteria, as described above. 
Otiierwise, the appropriate weights are stored in the node. 

Next, fbr a destinabon D, the network of nodes is organized into layers in step 2103, using tiie process 
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described in more detail in Figs. 22 and 23. As stated previously, each layer contains all nodes having the same 
distance, in hops, to the destination, using the shortest available path. 

In step 2104 an alternate route to destination D is generated, using the process described in more detail 
in Figs. 24-25. This process is distributed. I.e., it is performed In each node independently. 
5 After the alternate route for a given destination has been detennined, a decision is made in step 2105 as 

to whether all destinations have been processed. If not, steps 2103 and 2104 are repeated for the other des- 
tinations. If routes for all destinations have been detemiined, the process stops in step 2106. 

The layer generation or organization step 2103 of Fig. 21 is illustrated in more detail in Figs. 22 and 23. 
Initially, in step 2201 , the identity of each of the neighbors of the current node t are stored in a memory or other 

10 suitable storage device. This information would be available at each node if the exclusionary tree routing pro- 
cess is used. In step 2202 the shortest distance (k) in hops, between i and the destination node D is computed. 
This information is also available as a result of the exclusionary tree routing process. It should be noted, how- 
ever, that any other distributed shortest path algorithm can be used for primary route selection and any such 
algorithm would give us the shortest distance in hops. A similar procedure is then repeated for each neighbor 

15 j of i, in step 2203, to determine the distance m In hops between j and D. When the results of steps 2202 and 
2203 are both available, a comparison between m and k is made in steps 2204 and 2214. If m and k are deter- 
mined to be equal in step 2204, then it is concluded that j and i are in the same layer k (step 2205), and this 
infonnation is stored (step 2206). If it is determined that m = k - 1 in step 2214, then it is concluded that j is in 
layer k-1 (step 2215) and this infomiation is stored (step 2216). If the results of steps 2204 and 2214 indicate 

20 that m does not equal k or k-1 , then it is concluded that j is in a more distant layer k+1 firom D (step 2225). Th'ta 
information is therefore not needed, and is discarded in step 2226. 

The layering process is further described In Fig. 23, which is a continuation of Fig. 22. After a partteular 
neighbor] of the current node, i, has been examined to determine whether it is in the same layer k, a closer 
layer k-1, or a more distant layer k+1. a determination is made in step 2230 as to whether all neighbors j of 

25 node i have been examined. If not, the portion of the process beginning at step 2203 is repeated. After all 
neighbors of node 1 have been examined, a determination is made in step 2240 as to whether all destinations 
D have been examined. If not, the entire layer generation process, beginning at step 2202, is repeated for the 
next destination. When all destinations have been examined, the layering process is stopped in step 2250. 
The alternate route generation process of step 2104 of Fig. 21 is described in more detail in Figs. 24 and 

30 25. Initially for each destination D, all neighbors of node i that are in tiie same layer k as i (with respect to des- 
tination D) are stored in step 2401. This step thus uses the layering information previously obtained from the 
procedure described above in connectk>n with Figs. 22 and 23. In a similar manner, all neighbors of node i in 
layer k-1 (with respect to destination D) are also stored, In step 2402. After information regarding the neighbor- 
ing nodes has been stored, a determination is made in step 2403 of the weight w(i J) associated witii the link 

35 bettween nodes i and j within layer k. Similariy for each neighbor j' of node j in layer k-1, a detenminatlon is 
made in step 2404 of the weight w(j J') associated with the link between node J in layer k and node j' in layer 
k-1. The sum of the weights w(i,j) and w(j,j') is next computed and stored in step 2405. At this point, a deter- 
mination is made In step 2406 as to whether all neighbors j' of node j have been examined. If not, steps 2404 
and 2405 are repeated for the next neighbor]'. After all neighbors]' have been examined, a determination is 

40 made in step 2407 (now refening to Fig. 25) as to whether all neighbors j of node i have been examined. If not, 
the computation process beginning with step 2403 is repeated. After all neighbors j of node i have been 
examined the stored combined weights are processed in step 2408 to select nodes] and]' such that w(i,])+w(j,j') 
is a minimum. This minimum value determines the specific node ] that is the altemate route for traffic from node 
I that is destined for node D. 

45 After the altemate route for a specific destlnatton D has been computed, a determinatton Is made in step 
2409 as to whether all destinations D have been processed. If not the alterate route generation process begi- 
nning at step 2401 is repeated, so that a table of altemate routes, one for end destination, can be formed. When 
all destinatk>ns D have been processed, the process is stopped in step 2410. 

In order to avoid the spread of congestion caused by alternate routing, another feature of our inventton is 

so the maricing of a bit in the header of all packets that are routed on the altemate path. At all nodes In the altemate 
path, marked packets are given lower loss priority. This means that if the buffer occupancy at these nodes is 
below a preset threshold, then the mariced packet is admitted, otherwise it is discarded. If the altemate path is 
also busy, then the altemate routed traffic is dropped and the spread of congestion is avoided. This process 
is illustrated in Fig. 26. 

55 For each link outgoing from a node, a periodic measurement is made in step 2601 of the occupancy V of 
tiie buffer associated with that link. If x is detemiined to be less tiian a tiireshold value Ten in step 2602, traffic 
on that link is uncongested, so tiiat tiie uncongested routing table is selected in step 2603. If x is also less than 
a threshold value Taocp. both marked and unmariced packets are accepted for transmisston over that link. How- 
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ever, if x is greater than or equal to Taoepi traffic on the link Is relatively heavy (but stfll uncongested). In that 
circumstance, only unmarked packets are accepted in step 2607 for transmission over that link. 

If it Is detennined in step 2602 that buffer occupancy x Is equal to or greater than T^, the alternate route 
is selected in step 2604. However in this event, only a preselected fraction of packets are actually diverted from 

5 the primary route and routed on the altemate link. A test is next performed in step 2620 to determine if the out- 
going link selected by altemate routing is connected to the same node as the node from which the packet was 
received. This test is performed in order to avoid single link loops, and is based on infonnation relating to incom- 
ing and outgoing links that are readily available in the node. If the test result is positive so that a single link 
loop would be created, the packet is instead dropped or discarded in step 2621. Otherwise, a determination is 

10 made in step 2608 as to whether the altemate trunk was used to route the packet to the next node. If yes, the 
marking of that packet occurs in step 2609, so that the status of that packet as one having been altemate routed 
will be recognized in succeeding nodes. On the other hand, if altemate routing Is not used, the packet Is not 
marked (step 2610). 

In accordance with another aspect of our invention, we have found it advantageous to use link buffer occu- 

15 pancy as a measure of link congestion, to detemnine when altemate routing should be applied. The activation 
and deactivation of altemate routing, as welt as the decision to accept or reject an altemate routed cell, would 
tfien be based upon measurements of link buffer occupancy. Vinous specific buffer monitoring techniques can 
be used for this purpose, depending upon implementational convenience. For example, since link buffer occu- 
pancy fluctuates at great speed, it can be measured every millisecond. A running average of the 1000 most 

20 recent measurements can tiien be used to monitor congestion. When the average buffer occupancy exceeds 
a predetemnined congestion threshold, some of the traffic is altemate routed, and these packets are marked 
by setting a loss priority bit in the header. 

Fig. 27 illustrates, in simplified fomri, the functional architecture for a typical node 2701. As shown in that 
figure, node 2701 interconnects a series of incoming links 2710-2712 with a series of outgoing links 2720-2722. 

25 Links 2710-2712 and links 2720-2722 may in some implementation each be one or more high speed data 
trunks. Input buffers 2715-2717 receive packets applied on links 2710-2712, respectively, and apply the pack- 
ets to a nodal processor 2730 to be described below. Likewise, output buffers 2725-2727 receive packets output 
firom nodal processor 2730 that are destined for links 2720-2722, respectively. The occupancy or fullness of 
output buffers 2725-2727 are monitored in a congestion monitor 2740 which is part of nodal processor 2730, 

30 to determine when one or more links 2720-2722 Is congested. The output of congestion nK)nitor 2740 controls 
nodal processor 2730 such that a primary route to a destination is selected from table 2750 in tiie absence of 
congestion and an alternate route to a destination is selected from table 2760 in tiie presence of congestbn. 
Nodal processor 2730 also includes a single link loop avoidance processor 2770, which is activated when con- 
gestion routing is used. The purpose of this processor is to assure tiiat a packet originating at a neighboring 

35 node is not sent back to that node, so as to avoid forming a single link loop. This may be accomplished by keep- 
ing track of the input link on which a packet is received, and dropping the packet (i.e., not transmitting It) if the 
congested route specified by congested routing table 2760 is on a link back to the same node. 

A more complete functional description of the arrangement of nodal processor 2730 is contained in Fig. 
28. Nodal processor 2730 contains a centiBi processing unit (CPU) 2810 and a memory 2850 having several 

40 portions. The network layering information that results from the process illustrated in Figs. 22 and 23 is stored 
for each destination node in portion 2802 of memory 2850, while network topology infonmation is stored in 
another portion 2801 of the same memory 2850. Weights corresponding to different node pairs are also stored 
in the same portion of memory 2850. 

Whenever there is a change in the network topology, tiie new network layering is calculated for each des- 

45 tination node and stored in portion 2802. CPU 281 0 then uses the networi< layering information and the weight 
information to compute primary and altemate paths, which are stored in portions 2820 and 2830 of memory 
2850. Persons skilled in the art will recognize that virious implementattons for CPU 2810 and memory 2850 
are readily available. 

The benefits afforded by our altemate routing technique can be illustrated using a simple 3 node model of 
50 Fig. 29, which penmits computation of end-to-end blocking with and without altemate routing for various offered 
loads. Based on this analysis, we have detenmined tiiat altemate routing provides very significant improvements 
in end-to-end blocking. 

In Fig. 29, node 2901 has two traffic streams, one destined for node 2902 and tiie other for node 2903. 
The traffic destined for node 2902 has a mean anival rate of and the traffic destined for node 2903 has a 
55 mean arrival rate of X13. Node 2902 has a single traffic stream with mean arrival rate X23 destined for node 2903. 
Let us suppose that, ni2. ni3, njs are buffers in which cells from the traffic streams corresponding to Xia, X13 
and X23 queue up for service. All queues are first-ln, first-out (FIFO.) All arrivals are assumed Polsson and all 
service times are exponential. It is assumed that there is no receive buffer overflow and. hence, we do not model 
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the receive buffers. 

Using this model, the Impact of alternate routing on the ^.^3 traffic is examined by subjecting a fraction of 
the X13 traffic to alternate routing so that, if the occupancy in buffer nia exceeds a certain specified threshold, 
called the rejection threshold, the alternate routable fraction of is offered to buffer ni2, for transmission 

5 through node 2. Buffer ni2 accepts the altemate routed traffic only if its occupancy is below a specified threshold, 
called the acceptance threshold; if not, it gets rejected and is lost. Once the altemate routed traffic reaches 
node 2902, it Is accepted by buffer r\2z only if its occupancy is below the acceptance threshold. It is important 
to note that the node 2901 to 2902 and node 2902 to 2903 traffic streams are not subject to altemate routing. 
This is because we wish to study the impact of altemate routing on the end-to-end blocking of the node 2901 

10 to 2903 traffic, as we increase Xis while l^eeping X12 X23 constant. The queueing model corresponding to 
the network in Fig. 29 is shown in Fig. 30. 

In Fig. 30, V|3d denotes the direct routed component of X13, and Xiaa denotes the altemate routable portion 
of X^3. The overall arrfval rate for the node 2901 to 2903 traffic Is X13 = + X^^. Using a birth<death process 
model, we have derived exact expressions for the end-to-end blocking suffered by the three traffic classes. In 

IS our analysis, we assumed a buffer size of 100 for ni2, n^z and n23, since it yields a cell blocking probability of 
roughly 1(H at an offered load of 0.9. We chose the rejection threshold to be 70 and the acceptance threshold 
to be 50. This means that whenever the occupancy of buffer n^s exceeded 70, the cells from the Xi3a stream 
are altemate routed to buffer n^2- Buffer accepts the altemate routed X^sa cells only if Its occupancy is below 
50. Similarly, the alternate routed X^^ cells are accepted by buffer n23 for transmission to node 2903 only if the 

20 occupancy at buffer n23 is below 50. All ceils that are not accepted are lost, in this simple model, we have not 
accounted for message retransmission. We kept the offered load due to X12 and X13 constant at 0.8 and vied 
the offered load due to X^a from 0.5 to 2.0. 25% of the 1-to-3 traffic was subject to altemate routing. The end- 
to-end blocking suffered by the node 2901 to 2903 traffic at these various loads, with and without alternate rout- 
ing, is shown in Fig. 31. Curve 3101 gives the blocking probability without altemate routing and curve 3102 

25 gives the blocking probability with altemate routing. From Fig. 31 , it is dear that there is substantial improvement 
in end-to-end blocking, with altemate routing. Fig. 31 does not exhibit the sharp increase in blocking that nor- 
mally occurs with other alternate routing techniques that do not mark packets to avoid the spread of congestbn, 
as advantageously provided in our invention. Fig. 32 is a reseated version of Fig. 31 showing the end-to-end 
blocking experienced by the node 2901 to 2903 traffic when the offered load ranges from 0.8 to 1.2. Again, 

30 curve 3201 represents blocking probability without altemate muting and curve 3202 represents blocking prob- 
ability with altemate routing. Fig. 32 clearly shows the dramatic improvement in end-to-end blocking for the node 

2901 to 2903 traffic over a range of offered load of practical interest. Because direct routed traffic is given priority 
(altemate routed traffic is accepted only if the buffer occupancy is below 50), the node 2901 to 2902 and node 

2902 to 2903 traffic suffer no significant perfomnance degradation, even when the offered load due to the node 
35 2901 to 2903 traffic is 2.0. The end-to^nd blocking for the node 2901 to 2902 and node 2902 to 2903 traffic 

remains virtually at zero. 

In summary , the congestion control scheme In accordance with our Inventton has the following properties: 

(a) guarantees loop-freedom; 

(b) reacts to measurements and changes paths dynamically; 
40 (c) needs local measurements only; 

(d) does not spread congestion; and 

(e) carries traffic on lightly loaded links. 

Indeed, the invention allows a connectionless network to efficiently carry as much traffic as possible, since 
packet loss that ordinarily results from buffer overflow is reduced, and the retransmission problem is alleviated. 
45 No additional signaling messages need to be exchanged between network nodes. 

Various modifications and adaptations of the present Invention will be readily apparent to those of ordinary 
skill in the art. Accordingly, it is intended that the invention be limited only by the appended claims. 



50 Claims 

1. A method of routing information packets from a first node in a network of interconnected nodes to a des- 
tination node, comprising the steps of 

a) fomnlng a first routing table containing the primary route to be taken by infomiatton packets at said 
55 first node destined for said destination node and a second routing table containing an altemate route 

to be taken by infonnfiation packets when said primary route is congested; 

b) rrionitoring congestion in said network; and 

c) routing a portion of said informatk>n packets over said altemate route in the presence of congestion; 
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wherein said second routing table is fbnmed by 

d) deterniining other nodes In said network that are interconnected with said first node; 

e) organizing each of said interconnected nodes including said first node into a series of layers in accord- 
ance with their distance, in hops, to said destination node; 

5 f) assigning a weight to each possible path between said first node and each of said other interconnected 

nodes in the same layer, 

g) assigning a weight to each possible path between each of said other interconnected nodes in said 
same layer and a connected node in a different layer, said different layer being doser to said destination 
node; and 

10 h) selecting said alternate route by minimizing the pairwise sum of the weights obtained during said first 

and second assigning steps (f) and (g) above. 

2. The method of daim 1 wherein said weight assigning steps indude computing the distance between nodes 
using coordinate infomnation representing the location of said nodes. 

IS 

3. A method of controlling congestion in the flow of information bearing packets traveling over paths in a net- 
work of interconnected nodes, comprising the steps of 

routing packets from each node to destination nodes via multihop primary routing paths; 
monitoring congestion in said nodes in said network; and 
20 routing packets from ones of said nodes to said destinations via alternate multihop routing paths 

in the event that congestion is encountered in said network; 
wherein said alternate routing paths are determined by 

grouping said interconnected nodes into a plurality of layers, each layer containing nodes that are 
the same distance, in hops from a particular destination; 
25 assigning a weighting factor to each path between interconnected nodes in said layers; 

assigning a weighting factor to each path between interconnected nodes in adjacent layers; and 
selecting said altemate routing paths as a function of combinations of saki weighting factors. 

4. The invention defined in dalm 3 wherein said primary path contains k hops and said altemate path contains 
30 at least k+1 hops. 

5. The invention defined in daim 3 wherein said selecting step indudes 

fomning the pairwise sum of weighting factors assigned during both of said assigning steps. 

35 6. The invention defined in dalm 3, wherein said assigning step indudes: 

fonming said weighting factor as a function of the distance between nodes connected via said paths. 

7. The Invention defined in dalm 3, wherein said altemate route is used only for a portion of the packets inten- 
ded for a congested primary routing path. 

40 

8. The invention defined in daim 7, wherein said method further indudes the steps of 

marking any packet transmitted over an altemate routing path; 

examining each packet at each node before it is routed, to determine if it has been marked; and 
routing marked packets only If said node is uncongested. 

45 

9. A method of selecting loop free altemate multi-hop paths for information bearing packets traveling over a 
network of communication nodes, comprising the steps of 

storing in each of said communication nodes infonmation describing the connections between each 
node in said network and neighboring nodes; 
50 storing in each of said communication nodes information for assigning weights assigned to paths 

between each connected pair of nodes; 

grouping interconnected nodes into k layers, each layer containing nodes having the same dist- 
ance, in hops, from a potential destination; 

computing, for each node in layer k, the poise sum of ttie stored weights assigned to a) paths be- 
55 tween said node and a first set of connected nodes in layer k; and b) paths between said first set of con- 

nected nodes in layer k and a second set of connected nodes in layer k-1, and 

selecting as the altemate route firom said node in layer k to said potential destination, the patii hav- 
ing the smallest of said pairwise sums. 
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10. The invention defined in daim 9, wherein said first storing step includes fonming an adaptive minimum 
spanning tree representation of said network. 

1 1 . The invention defined in daim 9. wherein said second storing step indudes storing coordinate information 
5 representing the horizontat and vertical location of each of said nodes with respect to a reference system. 

12. A method of reducing congestion in a connectionless network induding a plurality of interconnected nodes, 
comprising the steps of 

associating with each node in said network, a primary route to be taken by at least a portk)n of traffic 
10 from said node destined for each destination node; 

associating with each node in said network, an alternate route to be taken by traffic from said node 
destined for each destination node in the event that said primary route is congested; 
monitoring congestion in said network, and 

routing traffic on said alternate route in the event that congestion Is detected; 
15 wherein said first assodation step indudes forming a k-hop route using adaptive minimum spanning 

tree routing; and 

wherein said second association step indudes fomilng a route having at least k+1 hops, based upon 
connectivity information locally available in sakJ each node. 

20 13. In a network of interconnected nodes in which packets are transmitted over a primary route determined 
by selecting the shortest path, in hops, between originating node and the destination node, a method of 
providing an altemate route in the event said primary route is congested, said method comprising the steps 
of 

grouping nodes between said originating node and said destination node into a plurality of groups, 
25 such that the nodes in the k^ group are equally distant, in hops, from said destination node; 

assigning a weight, w(i,j) to each path between nodes I and j in group k and a weight w(f,j') to each 
path between node j in group k and node j' in group k-1 , 

selecting said alternate path such that w(i,j)+w(j,j') is minimized. 

30 14. A method of determining an alternate route for traffic in a connectionless network of nodes when the prinv 
ary route between said nodes is congested, comprising the steps of 

for each destination, grouping said nodes as a function of the distance of said node from said des- 
tination; 

assigning a first weighting factor to each path between a node in one of said groups and each con- 
as nected node in the same group, and a second weighting factor to each path between each of said con- 
nected nodes in the same group and other connected nodes in another of said groups; and 
selecting said altemate route as a function of said first and second routing factors. 

15. Apparatus for controlling congestion in the flow of infonmation bearing packets traveling over paths in a 
40 network of interconnected nodes, comprising 

a) means for monitoring congestion in primary and secondary routing paths within said network; and 

b) means for routing packets from each node to destinatton nodes via multihop primary routing paths 
in the absence of congestion and for routing packets from ones of said nodes to said destinations via 
alternate multihop routing patiis in the event that congestion is encountered in said primary routing 

45 patiis; 

wherein said routing means indudes 

means for grouping said interconnected nodes into a plurality of layers, each layer containing nodes 
that are the same distance, in hops from a particular destination; 

means for assigning a weighting factor to each path between interconnected nodes in said layers, 
50 and for assigning a weighting factor to each path between interconnected nodes in adjacent layers; and 

means for selecting sakJ altemate routing paths as a function of combinations of said weighting fac- 
tors. 

16. The invention defined in daim 15 wherein said primary path contains k hops and said altemate path con- 
55 tains at least k+l hops. 

17. The invention defined in claim 15 wherein said selecting means indudes 

means for forming pairwise sums of said weighting factors for 
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a) paths between nodes in the same layer, and 

b) paths between nodes in adjacent layers. 

18. The invention defined in daim 15, wherein said assigning means includes: 

$ means for forming said weighting factor as a function of the distance between nodes connected 

via said paths. 

19. The invention defined in claim 15, wherein said routing means is arranged so that said alternate route is 
used only for a portion of the packets intended for a congested primary routing path. 

10 

20. The invention defined in daim 19, wherein said apparatus further includes 

means for marking any packet transmitted over an alternate routing path; 
means for examining each packet at each node before it is routed, to detemnine if it has been mar- 
ked; and 

IS means for routing marked packets only rf said node is uncongested. 

21. Apparatus for selecting loop free altemate multi-hop paths for information bearing packets traveling over 
a networic of communication nodes, comprising 

means for storing in each of said communication nodes (a) information describing the connections 
20 between each node in said network and neighboring nodes, and (b) infonmation for assigning weights to 

paths between each connected pair of nodes; 

means for grouping interconnected nodes into k layers, each layer containing nodes having the 
same distance, in hops, from a potential destinatk>n; 

means for computing, for each node in layer k, the pairwise sum of the stored weights assigned to 
25 a) paths between said node and a first set of connected nodes in layer k; and b) paths between said first 

set of connected nodes in layer k and a second set of connected nodes in layer k-1, and 

means for selecting as the altemate route from said node in layer k to said potential destinatbn, 
the path having the smallest of said pairwise sums. 

30 22. Apparatus for reducing congestion in a connecttonless network including a plurality of interconnected 
nodes, comprising 

means for associating with each node in said network a) a primary route to be taken by at least a 
portion of traffic from saki node destined for each destination node, and b) an altemate route to be taken 
by traffic from said node destined for each destination node in the event that said primary route is oon- 
35 gested; 

means for monitoring congestion in said network, and 

means for routing traffic on said altemate route in the event that congestion is detected; 
wherein said associating means indudes (a) means for forming a k-hop route using adaptive mini- 
mum spanning tree routing, and (b) means for forming a route having at least k+1 hops based upon con- 
40 nectivity infonmatton locally available in said each node. 

23. In a networic of interconnected nodes In whteh packets are transmitted over a primary route determined 
by selecting the shortest path, in hops, between the originating node and the destination node, apparatus 
for providing an altemate route in the event said primary route is congested, said apparatus comprising 

45 means for grouping nodes between said originating node and said destination node into a plurality 

of groups, such that the nodes in the k^ group are equally distant, in hops, from said destination node; 

means for assigning a weight, w(i,j) to each path between nodes i and j In group k and a weight 
w(jj') to each path t>etween node j in group k and node j' in group k-1 , and 

means for selecting said altemate path such that w(iJ}+wG,j') is minimized. 

50 

24. Apparatus for detenmining an altemate route for traffic in a connectionless networic of nodes when the prim- 
ary route between said nodes is congested, comprising 

for each destinatbn, means for grouping said nodes as a fonctnn of the distance of said node fircmi 
said destination; 

55 means for assigning a first weighting factor to each path between a node in one of said groups and 

each connected node in the same group, and a second weighting factor to each path between each of 
said connected nodes in the same group and other connected nodes in another of said groups; and 
means for selecting said altemate route as a function of said first and second weighting footers. 
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